مقایسه روش های طیفی برای شناسایی زبان گفتاری
Authors
Abstract:
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The Gaussian mixture model is the most common statistical model in spectral-based language identification systems. On the other hand, in phonetic-based methods, speech signals are divided into a sequence of tokens using the hidden Markov model (HMM) and a language model is trained using the obtained sequence. Approaches like PRLM, PPRLM, and PR-SVM are some examples of phonetic-based methods. In research papers, usually a combination of phonetic-based and spectral-based systems are used to achieve a high quality language identification system. Spectral-based methods have been the focus of researchers, since they have no need for labeled data and usually achieve better results than phonetic approaches. Therefore, in this paper, these methods used for language identification and different spectral methods, are introduced, implemented, and compared with spoken language recognition. The basic spectral language identification method is Gaussian Mixture Model-Universal Background Model (GMM-UBM). In this paper, the MMI discrimination method is used to improve the Gaussian model of each language. Moreover, in order to model the language dynamically, GMM is replaced with the ergodic hidden Markov model (EHMM). GSV-SVM and GMM tokenizer methods are also implemented as two popular spectral approaches. In this paper, novel speaker and channel variation modeling methods are used as language identification approaches, including joint factor analysis (JFA), identity vector (i-Vector) and several variations compensation methods exploited to improve the results of i-Vector. Furthermore, in order to boost the performance of language recognition systems, different post-processing methods are applied. For post-processing, each element of raw score vector indicates the degree by which the spoken signal belongs to a language. Post-processing methods are applied to this vector as a classifier and allows making better language detection decisions by mapping the raw score vector to a space of desired languages. Different studies have employed different post-processing methods, including GMM, NN, SVM, and LLR. This study exploits several score post-processing methods to improve the quality of language recognition. The goal of the experiments in this article is to detect and distinguish Farsi, English, and Arabic, individually and simultaneously from other languages. The latter is also called open-set language identification. The signals considered in this paper include two-sided conversations, whose quality is usually not desirable due to strong noise signals, background noises of individuals or music, accents, etc. Gaussian mixture-universal model (GMM-UBM) was implemented as the basic method. In this approach, mean EER of the three target languages (Farsi, English, and Arabic) was 13.58. Experimental results indicated that training the GMM language identification system with the MMI discrimination training algorithm is more efficient than systems only trained by the ML algorithm. More specifically, the mean EER of the three target languages was reduced about 8 percent in comparison to GMM-UBM. The GMM tokenizer method was also tested as a novel spectral approach. Using this method, the mean EER of the three target languages was also about 5 percent better than GMM-UBM. In this study, the GSV-SVM discrimination method was also used for language recognition. The results of this method were considerably better than those of common spectral approaches, such that the mean EER of the three target languages was reduced by 11 percent in comparison to GMM-UBM. This study improves the low speed of this method using a model pushing method. This study also implemented two novel methods, JFA and i-Vector. According to the results, both of these methods provide better results than GMM-UBM, such that the mean EER values of the three target languages in JFA and i-Vector are respectively reduced by 1% and 12%. Generally, experimental results showed that i-Vector provides better results than other spectral language identification systems. This study is a result of a seven-year research in spoken language identification in the advanced technology development center of Khajeh Nasiredin Tousi. The ongoing research includes studying and implementing novel spectral language identification algorithms like PLDA and state-of-the-art phonetic language identification methods to combine the two spectral and phonetic systems and eventually, achieving a high quality language identification system.
similar resources
بررسی اشتباهات گفتاری در رسانه های جمعی روسی زبان ایرانی
رابطة تعاملی بین رسانه های جمعی و زندگی اجتماعی‘ اعضای جامعه را مدام با مسائل فرهنگی روبرو می سازد. زبان یکی از لایه های مهم محیط پیرامونی انسان است و از این رابطه دو سویه با رسانه های جمعی ‘ تأثیر می پذیرد . در حال حاضر زبان ادبی روسی با تحولات عمیقی روبرو می باشد. ما شاهد استفاده گسترده از کلمات عامیانه‘ عبارت های نا آشنا و کلمات خارجی به خصوص انگلیسی در زبان معاصر روسی هستیم. تمام بخش های زب...
full textکاربرد نشانگرهای طیفی لحظهای برای شناسایی کانالهای نفتگیر
نشانگرهای لرزهای ابزار مفیدی در تفسیر پدیدههای چینهشناسی هستند. استفاده از نشانگرهای لرزهای این امکان را فراهم میآورد که پدیدههای زمینشناسی که به شکل معمول در مقطع لرزهای قابل مشاهده نیستند را مشاهده کنیم. یکی از این پدیدهها کانالهای مدفون رودخانهای میباشد. کانالهای پر شده توسط سنگهای متخلخل که به وسیله یک خمیره ناتراوا محصور شدهاند، در اکتشافات چینه ای از اهمیت ویژهای برخوردارن...
full textمقایسه روش های فراابتکاری برای
Abstract With the introduction of mean-variance model Markowitz took a giant step in modeling and optimizing portfolio type problems. But his model is based upon some assumptions that rarely they can hold in practice. For this reason, many researchers have taken steps both theoretical and practical to make some improvements to his standard mean-variance model. Up to now different risk criteria...
full textکاربرد روش قطبش القایی طیفی (SIP) برای اکتشاف منابع هیدروکربوری
روش قطبش القایی طیفی، شاخهای از روشهای اکتشافی در ژئوفیزیک است که بهطور گستردهای در پیجوییهای محیطی و ژئوفیزیکی، علاوه بر پیجوییهای اکتشافی مواد معدنی در خصوص، نفت و گاز و زغال سنگ نیز بهکار رفته است. روش قطبش القایی قادر است اندازهگیریهایی از مقاومت ویژه مختلط ظاهری چند بسامدی در محدوده بسامدهای 2-10 تا 102 هرتز را فراهم کند. این اندازهگیریها برای تعیین پارامترهای طیفی و توزیعها...
full textبررسی ترمیمهای گفتاری در زبان فارسی
ترمیم، یکی از پدیدههای رایج در مکالمات روزمره همه زبانها و از جمله زبان فارسی است. ترمیم جایگزینی است برای گفتار تولید شده قبلی که گوینده یا فرد دیگری در گفتگو بیان میکند. مطالعه ترمیم، بخشی از حوزه مکالمه کاوی است که گفتگوهای روزمره را در قالب دادههای صوتی و تصویری مطالعه میکند. پژوهش حاضر به دنبال مشخص کردن انواع ترمیمها و جایگاه آنها در توالی نوبتها در گفتگوهای تلویزیونی زبان فارسی...
full textMy Resources
Journal title
volume 14 issue 1
pages 111- 134
publication date 2017-06
By following a journal you will be notified via email when a new issue of this journal is published.
No Keywords
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023